Small in Size, Big in Precision: A Case for Using Language-Specific Lexical Resources for Word Sense Disambiguation

نویسندگان

  • Steven Neale
  • João Silva
  • António Branco
چکیده

Linked open data (LOD) presents an ideal platform for connecting the multilingual lexical resources used in natural language processing (NLP) tasks, but the use of machine translation to fill in gaps in lexical coverage for resource-poor languages means that large amounts of data are potentially unverified. For graph-based word sense disambiguation (WSD), one approach has been to first translate terms into English in order to disambiguate using richer, fuller lexical knowledge bases (LKBs) such as WordNet. In this paper, we show that this approach actually creates more ambiguity and is far less accurate than using languagespecific resources, which, regardless of their smaller size, can provide results comparable in accuracy to the state-of-theart reported for graph-based WSD in English. For LOD, this demonstrates the importance of continuing to grow and extend language-specific resources in order to continually verify and reintegrate them as accurate resources.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Design and implementation of Persian spelling detection and correction system based on Semantic

Persian Language has a special feature (grapheme, homophone, and multi-shape clinging characters) in electronic devices. Furthermore, design and implementation of NLP tools for Persian are more challenging than other languages (e.g. English or German). Spelling tools are used widely for editing user texts like emails and text in editors.  Also developing Persian tools will provide Persian progr...

متن کامل

Automatic Construction of Persian ICT WordNet using Princeton WordNet

WordNet is a large lexical database of English language, in which, nouns, verbs, adjectives, and adverbs are grouped into sets of cognitive synonyms (synsets). Each synset expresses a distinct concept. Synsets are interlinked by both semantic and lexical relations. WordNet is essentially used for word sense disambiguation, information retrieval, and text translation. In this paper, we propose s...

متن کامل

Published vs. Postgraduate Writing in Applied Linguistics: The Case of Lexical Bundles

Abstract: Lexical bundles, as building blocks of coherent discourse, have been the subject of much research in the last two decades. While many of such studies have been mainly concerned with  exploring  variations  in  the  use  of  these  word  sequences  across  different  registers  and disciplines, very few have addressed the use of some particular groups of lexical bundles within some gen...

متن کامل

Unsupervised Disambiguation for a Multilingual Medical Information System using UMLS

This paper describes techniques for unsupervised word sense disambiguation of English and German medical documents using the Unified Medical Language System (UMLS). We present both monolingual techniques which rely only on the structure of UMLS, and bilingual techniques which also rely on the availability of parallel corpora. The best results are obtained using relationships between terms given...

متن کامل

Semi-Automatic Extension of Large-Scale Linguistic Knowledge Bases

Linguistic resources are essential for the success of many AI tasks. Building a new lexical resource from scratch or combining heterogeneous resources is not only complex and time-consuming, but can also lead to knowledge inconsistency and redundancy. In this paper, we present a novel method for the large-scale semantic enrichment of a computational linguistic resource. To this end, with the ai...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2015